A Simulation Study Comparing Two Methods Of Evaluating Differential Test Functioning (DTF): DFIT and the Mantel-Haenszel/Liu-Agresti Variance

نویسندگان

  • Charles Hunter
  • Charles Vincent Hunter
چکیده

This study uses simulated data to compare two methods of calculating Differential Test Functioning (DTF): Raju’s DFIT, a parametric method that measures the squared difference between two Test Characteristic Curves (Raju, van der Linden & Fleer, 1995), and a variance estimator based on the Mantel-Haenszel/Liu-Agresti method, a non-parametric method enabled in the DIFAS (Penfield, 2005) program. Most research has been done on Differential Item Functioning (DIF; Pae & Park, 2006), and theory and empirical studies indicate that DTF is the summation of DIF in a test (Donovan, Drasgow & Probst; 2000, Ellis & Mead, 2000; Nandakumar, 1993). Perhaps because of this, measurement of DTF is under-investigated. A number of reasons can be given why the study of DTF is important. From a statistical viewpoint, items, when compared to tests, are small and unreliable samples (Gierl, Bisanz, Bisanz, Boughton, & Khaliq, 2001). As an aggregate measure of DIF, DTF can present an overall view of the effect of differential functioning, even when no single item exhibits significant DIF (Shealy & Stout, 1993b). Decisions about examinees are made at the test level, not the item level (Ellis & Raju, 2003; Jones, 2000; Pae & Park, 2006; Roznowski & Reith, 1999; Zumbo, 2003). Overall both methods performed as expected with some exceptions. DTF tended to increase with DIF magnitude and with sample size. The MH/LA method generally showed greater rates of DTF than DFIT. It was also especially sensitive to group distribution differences (impact) identifying it as DTF where DFIT did not. An empirical cutoff value seemed to work as a method of determining statistical significance for the MH/LA method. Plots of the MH/LA DTF indicator showed a tendency towards and F-distribution for equal Reference and focal group sizes, and a normal distribution for unequal sample sizes. Areas for future research are identified. INDEX WORDS: DTF, Differential Test Functioning, DFIT, Mantel-Haenszel A SIMULATION STUDY COMPARING TWO METHODS OF EVALUATING DIFFERENTIAL TEST FUNCTIONING (DTF): DFIT AND THE MANTELHAENSZEL/LIU-AGRESTI VARIANCE by Charles Vincent Hunter, Jr.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Differential item functioning procedures for polytomous items when examinee sample sizes are small

As part of test score validity, differential item functioning (DIF) is a quantitative characteristic used to evaluate potential item bias. In applications where a small number of examinees take a test, statistical power of DIF detection methods may be affected. Researchers have proposed modifications to DIF detection methods to account for small focal group examinee sizes for the case when item...

متن کامل

A new approach for differential item functioning detection using Mantel-Haenszel methods. The GMHDIF program.

To date, the statistical software designed for assessing differential item functioning (DIF) with Mantel-Haenszel procedures has employed the following statistics: the Mantel-Haenszel chi-square statistic, the generalized Mantel-Haenszel test and the Mantel test. These statistics permit detecting DIF in dichotomous and polytomous items, although they limit the analysis to two groups. On the con...

متن کامل

Alternate Cutoff Values and DFIT Tests of Measurement Invariance

Likert scales are routinely used in educational and psychological research as measures of constructs of interest. If sound scale development procedures are followed, the resulting scale can reliably and validly measure a construct. However, if a given scale is used to make comparisons among different populations of respondents (e.g., cultures; Riordan & Vandenberg, 1994), over time in longitudi...

متن کامل

Sensitivity of DFIT Tests of Measurement Invariance for Likert Data

Likert scales are routinely used in educational and psychological research as measures of constructs of interest. If sound scale development procedures are followed, the resulting scale can reliably and validly measure a construct. However, if a given scale is used to make comparisons among different populations of respondents (e.g., cultures; Riordan & Vandenberg, 1994), over time in longitudi...

متن کامل

Academic Discipline DIF in an English Language Proficiency Test

The purpose of this study was to detect differentially functioning items in the University of Tehran English Proficiency Test (UTEPT) which is a high stake test of English developed and administered by the Language Testing Centre of the University of Tehran. This paper is based on the answers of 400 test takers to the test. All participants earned a master degree either in humanities or science...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015